High temperature unfolding simulations of a single-stranded DNA i-motif 
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We present the results of high temperature 500 K Molecular Dynamics simulations of the DNA 
i-motif. The essential dynamics and the main unfolding pathways are compared to a biased meta- 
dynamics simulation at 300 K. Our results indicate a remarkable agreement of the concerted motion 
at both temperatures. The transition can be described by a few number of eigenvectors indicating 
a simple unfolding mechanism. Two main mechanisms for the unfolding pathway at 500 K can be 
detected which are in good agreement to the results of the biased simulation at 300 K. 
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INTRODUCTION 



High temperature simulations are common tools for 
the investigation of protein unfolding properties [l|-[3. 
Due to the inherent long timescales apparent in the un- 
folding processes [2, 5], a decrease in computation time is 
even nowadays still desirable. By the usage of elaborated 
temperatures which are typically above the protein melt- 
ing temperature Tm, a significant decrease of unfolding 
times can be observed [^]^. As it is often assumed, an 
Arrhenius-type behavior of unfolding kinetics is responsi- 
ble for this remarkable acceleration [6] while the pathway 
of unfolding is unchanged [4]. 

Over the years, the desirability of using elaborated tem- 
peratures has lead to several novel insights. It has been 
shown that the properties of the solvent related to the 
chosen water model at higher temperatures are in agree- 
ment to experimental results [2]. Hence the validity of 
the unfolding pathway is not influenced by spurious arte- 
facts of the environment. Furthermore it is known that at 
very high simulation temperature the unfolding process 
occurs more rapidly due to a simultaneous breaking of 
several intramolecular interactions such that intermedi- 
ate states cannot be detected considerably [6]. To avoid 
this drawback, the concept of an upper critical temper- 
ature has been introduced [6]. Carrying out simulations 
below this temperature leads to an enhanced sampling 
of intermediate states such that the complete unfolding 
pathway can be studied in detail. Hence the transforma- 
tion of high temperature to low temperature pathways is 
methodologically validated although a detailed compari- 
son is often missing. 

The appHcation of elaborated temperature unfolding sim- 
ulations can be also useful to study the unfolding mecha- 
nisms for specific DNA and RNA structures due to large 
free energy barriers. Prominent examples for these con- 
figurations are non Watson- Crick like structures in DNA 
like the G-quadruplex structures and the i-motif pzLfll]. 
The first one is formed by guanine (G) rich sequences 
[sl while the latter is present in more cytosine (C) rich 
strands of DNA [3] . Although binding seems to be unfa- 



vorable for these structures, stability is achieved by a pro- 
ton mediated cytosine binding between different strands 
or regions of the sequence resulting in a C-CH+ pairing 
BSE El. Hence an acidic environment is able to 
spend a proton which leads to a hemi-protonated cyto- 
sine mimicking an ordinary C-G binding as it is present 
in common DNA. It becomes clear that these structures 
have been found to be only stable at slightly acidic to 
neutral conditions resulting in pH values ranging from 
4.8 to 7.0 0, 0, [H. Furthermore i-motif structures 



show a remarkable stability [l2| and have been found 
as tetrameric and double stranded complexes between 
different molecules although they are also occurring in 
single stranded DNA A picture of the C-CH+ com- 
plex and the corresponding single stranded i-motif with 
its sequence is shown in Fig. [TJ 

However, a detailed investigation of the function in the 
human cell is still missing although the applicability as 
a new class of possible tar gets for cancers and other dis- 
eases has been discussed llsL [l3 . In contrast to this 




FIG. 1: C-CH+ pairing (bottom left) which is responsible for 
the formation of the DNA i-motif (top left) with the corre- 
sponding sequence (right) where C,T and A denote cytosine, 
adenine and thymine. 

lack of knowledge, the usage of the i-motif in modern 
biotechnology has experienced an enormous growth over 
the last years [10]. Since the i-motif becomes unstable 
at basic pH values, a systematic decrease and increase of 
protons in the solution by changing the pH value leads 
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to a reversible folding and unfolding mechanism. It has 
been shown that the unfolding and folding occurs on a 
timescale of seconds lo|, 12 1. Due to these properties the 
presence of i-motifs in modern nano applications is obvi- 
ous. 

Technological examples for the i-motif are given by 
molecular nanomachines [lol, 15 1, switchable nanocon- 
taine rs ^16i| . sensors to detect the pH value inside living 
cells [17|, building materials for logic gate devices [18| 
and sensors for distinguishing single walled and multi- 
walled carbon nanotube systems |19| . 
Recently it has been assumed [2o| , that the grafting den- 
sity massively influences the structure of an i-motif layer 
due to steric hindrance in the underlying folding and un- 
folding pathway. Hence a detailed investigation of the 
folding respectively unfolding pathway especially for high 
grafting densities in nanodevices of the i-motif and under 
physiological conditions is of prior importance. 
In this paper we present the results of high tempera- 
ture 500 K Molecular Dynamics simulations concerning 
the unfolding mechanism of a maximum unstable sin- 
gle stranded DNA i-motif structure without hemiproto- 
nated cytosine. Our results indicate a fast initial decay 
of the i-motif leading to stable conformations. By de- 
tailed investigation of the essential dynamics of a biased 
metadynamics simulation, we were able to identify the 
main unfolding pathways at 300 K. The good agreement 
to the high temperature unfolding pathway is validated. 
Our findings can be related to the calculated underlying 
free energy landscape which allows to identify the most 
stable conformations. 

The paper is organized as follows. In the next section we 
present the theoretical background of the metadynamics 
method and the essential dynamics methodology. Then 
we introduce the numerical details and present the results 
of our simulations. We conclude with a brief summary. 



METHODS 
Metadynamics 

The system we consider is described by a set of coordi- 
nates S{x) evolving under the action of dynamics follow- 
ing the trajectory S{x{t)) and described by a canonical 
equilibrium distribution at temperature T. The set of 
coordinates S{x) may include atomic positions or angles 
as well as any other auxiliary collective variable repre- 
senting the characteristics of the system. 
If the system shows metastability, some regions separated 
by large energy barriers cannot be explored by the evolu- 
tion of the trajectories for low temperatures in a reason- 
able simulation time. Thus reflecting the original idea of 
metadynamics, an additional potential energy at specific 
constant times ^1,^2, ...t^ is applied on the trajectory x{t) 
to overcome the barriers and to accelerate the rare events 



|2l| . Typically this potential energy is given by Gaussian 
hills which are summed at time t to 



VAS{x),t)=uj exp 



/\\2 



(1) 

where s^{t^) = S^{x{t)) are the d collective variables, 
u denotes the Gaussian height and 6s is the Gaussian 
width. By applying this procedure subsequently, free en- 
ergy minima can be escaped and the unfolding pathway 
can be identified [ii'-S] . Furthermore it has been shown 
[22] that the added potential energy resembles the un- 
derlying free energy 



lim V^{s,t) - -F{s) 



(2) 



after a lon ^ si mulation time. To overcome certain draw- 
backs [221.125! , we use a recently published variant of 
the metadynamics method [25|. Within this method, the 
metadynamics potential is calculated on a grid leading to 
a constant calculation time in contrast to ordinary meta- 
dynamics. The refinement of the resulting landscape is 
achieved by a histogram reweighting procedure of sev- 
eral biased simulation runs [25, 26]. A projection scheme 
allows to change the collective variables a posteriori [25| . 



Essential Dynamics 

The main part of a successful calculation relies on the 
choice of appropriate collective variables on which the 
free energy landscape and the resulting unfolding path- 
way is spanned. Well-suited collective variables are the 
essential eigenvectors of the system [27]. It has been re- 
cently demonstrated that the application of eigenvectors 
as collective variables results in adequate descriptions 
of the free energy landscapes in ordinary Metadynam- 
ics computations 28, 29|. Thus we follow this approach 
due to the assumption that nearly all relevant motion is 
captured in the first eigenvectors of the system [27]. In 
the following we give a brief description of the method. 
The essential eigenvectors can be calculated by the super- 
imposed coordinates f of N atoms of the system which 
build the covariance matrix C via 



C =< (r- < r >)(r- < r >)^ > 



(3) 



where i, j = 1,2,... 3A^ and the brackets < . . . > denote 
the reference value. The diagonalization of C leads to 



EAE 



(4) 



where E is a matrix of eigenvectors and A is a matrix 
of eigenvalues marking the positional fluctuations. Sort- 
ing the eigenvalues in decreasing order allows to identify 
the largest positional fluctuations by the first eigenvec- 
tors which form the essential subspace with all important 
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structural transitions j27[. The projection at time t on 
the i— th eigenvector is then defined by 



Pi{t) = {r{t)- <f>)-ei 



(5) 



with the specific eigenvectors ranging from i = 1, 2 . . . 3A^ 
28|,[29|. 

To compare the motion of several subsets of eigenvectors 
for different trajectories, a class of functions has been 
introduced 30|-33|. The surface overlap between 
N eigenvectors jUi from one set and M eigenvectors from 
another subset i7j can be calculated by the squared inner 
product 



N M 



(6) 



which gives an estimate for the similarity of two sets. If 
the spaces are completely orthogonal the relation gives 
= in contrast to identity which results in O^^ — 
1. Additionally the inner product matrix can be calcu- 
lated by 



(7) 



which allows to derive a detailed comparison between the 
motion of different subset or trajectories and the corre- 
sponding eigenvectors. 



SIMULATION DETAILS 

We have performed our Molecular Dynamics simula- 
tions of the i-motif in explicit TIP3P solvent at T=300 K 
and T=500 K by the GROMACS software package [35j. 
The single DNA strand consists of 22 nucleic acid bases 
given by the sequence 5' - CCC - [TAA - CCC] 3 - T - 3' 
where T, A and C denote thymine, adenine and cyto- 
sine. We modeled this structure which is directly re- 
lated to the sequence used in [l5| by slight modifications 
of the PDB entry lELN [36^. Our periodic simulation 
box contains 5495 TIP3P water molecules and 22 sodium 
ions to compensate the charging of the DNA. All inter- 
actions have been calculated by using the ffAMBER03 
force field [s^- After energy minimization, the initial 
warm up phase of 1 ns has been performed by keeping 
the position of the i-motif restrained. The cubic simu- 
lation box with periodic boundary conditions has a di- 
mension of (5.41 X 5.41 X 5.41) nm. We applied a Nose- 
Hoover thermostat to the system where all bonds have 
been constrained by the LINGS algorithm [35|. Electro- 
statics have been calculated by the PME algorithm (sHj 
and the time step was 2 fs. The calculation of the free 
energy landscape has been performed by the method pre- 
sented in 



. The biased metadynamics simulations 
at 300 K for the investigation of the unfolding pathway 



have been conducted by the program plug-in PLUMED 
(39! . The Gaussian hills were set each 2 ps with a height 
of 0.25 kJ/mol and a width of 0.25 nm. The correspond- 
ing reaction coordinates for the biased energy are the 
distance between nucleobases Gl and T22 and the dis- 
tance between the center-of-mass for the G1-T22 to the 
All nucleobase as they were also used in [38]. The final 
eigenvector free energy landscapes have been calculated 
by the projection scheme proposed in [25] where the land- 
scape has been refined via 15 biased simulations of 10 ns 
at 300 K and 500 K. A detailed protocol for the explicit 
calculation of the eigenvector free energy landscape in 
correspondence to the projection scheme is presented in 



25, 3 



For a detailed investigation of the unfolding mechanism 
we furthermore conducted five 500 K simulations and five 
300 K unbiased simulations each with 10 ns duration and 
averaged the resulting values. The root-mean square de- 
viation is used for the study of the unfolding pathway 



RMSD{t,to) 



\ 



1 

iV2 



N N 



^^(^^jW- <n,(to) >) 



(8) 

with the positions fi of N atoms for different times and 
the distances r^j between two atoms where < fi{to) > is 
the reference or average position. Due to reasons of sim- 
plicity, we paired each three nucleobases in one group re- 
sulting in the sequence GYS1-TAA1-GYS2-TAA2-GYS3- 
TAA3-GYS4-T. We chose the initial structure as refer- 
ence position and calculated the RMSD for the 500 K 
and the 300 K unbiased simulations for several groups. 



NUMERICAL RESULTS 

We start this section by presenting the values for the 
root-mean square deviation of the unbiased simulations 
at 500 K and 300 K in Fig. El The initial opening of the 
i-motif appears on a timescale up to 200 ps for the 500 K 
simulations. This is obvious by regarding the results for 
the slightly increasing RMSD until it becomes nearly con- 
stant from 200 ps to 400 ps. After 400 ps a stable struc- 
ture can be observed with minor fiuctuations indicated by 
a constant RMSD of the TAA1/TAA3 which is present 
up to 1.2 ns. As it has been discussed in [38] for 300 K, a 
reason for this stable conformation is a constant number 
of hydrogen bonds between the TAA1/TAA3 group. 
Significant fiuctuations in the GYS1/GYS3 and the 
GYS2/GYS4 group after 600 ps can be observed which 
may be responsible for the opening of the TAA1/TAA3 
group on a timescale up to 3 ns. Finally on longer 
timescales than 3 ns the structure unfolds into a fully 
extended strand in the 500 K simulations as it can be 
seen by the large RMSD of all groups. It is remarkable 
that the 300 K simulations do not reach the fully un- 
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FIG. 2: Root-mean-square deviations for different nucleobase 
combinations in the 500 K simulation and for the complete 
molecule at 300 K and 500 K (inset). 

folded structure as the inset of Fig. [2] indicates. Hence it 
becomes clear that the usage of biasing methods at these 
lower temperatures is required. 

A more detailed analysis of the unfolding pathway can be 
performed by the investigation of the essential dynamics 
[27| . To study the concerted motion of the molecule and 
to indicate the main structural transitions we analyzed 
the essential eigenvectors of the system. Most of the mo- 
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FIG. 3: Normalized cumulative positional fluctuation of the 
eigenvectors for the 500 K simulations and the corresponding 
relative positional fluctuation (inset). 

tion at 500 K which is about 66% is expressed in the first 
two eigenvectors as Fig. [3] indicates. Nearly 57% of the 
overall motion is incorporated in eigenvector 1. The es- 
sential subspace is formed by the first eight eigenvectors 
with roughly 88%. The relative positional fluctuations 
again indicate that eigenvector 1 is dominating the over- 
all motion with significant contributions from eigenvec- 



tors 2 and 3. The presence of all eigenvectors larger than 
8 is negligible. It can be concluded that this represents 
an unfolding pathway which can be effectively described 
by a small number of eigenvectors. 

The motion of the first two eigenvectors at 500 K is pre- 
sented in Fig. m Eigenvector 1 mainly describes the 
opening of the end-to-end distance by a stretching mode. 
Eigenvector 2 can be identified as the initial opening of 
the i- motif by the relaxation into a planar structure. 
To compare the unfolding pathways of 500 K to lower 
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FIG. 4: The motion of the eigenvectors 1 and 2 at 500 K. The 
arrows illustrate the movement along the eigenvectors. 

temperatures, we performed a 50 ns met adynamics sim- 
ulation at 300 K. This allows to overcome energetic bar- 
riers and to identify the main unfolding pathways which 
can be compared to the high temperature simulations by 
an analysis of the essential dynamics. 
For this we investigated the eigenvectors for the biased 
met adynamics run at 300 K, the unbiased 500 K and 
300 K simulations. The projection is performed along 
the unbiased 500 K subset of eigenvectors 1 and 2 on a 
timescale of 10 ns. 

Fig. [5] illustrates the importance of elaborated tem- 
peratures respectively the usage of biasing potentials to 
discover the unfolding pathway. It can be seen that the 
biased simulations at 300 K resembles the high tempera- 
ture trajectory in good agreement although it is limited 
to a smaller region. This is also true for the unbiased 
300 K simulation which is restricted to a very small sub- 
space due to occurring trapping in a specific configuration 
whereas the main unfolding pathway is identical. The 
results indicate that the unfolding mechanism is remark- 
ably accelerated by elaborated temperatures or biasing 
methods and obviously resembles the real low tempera- 
ture unfolding pathway. 

The results above are additionally validated by the cal- 
culation of the squared inner product O.. (Eqn. [6j) for 
the complete biased 300 K and the unbiased 500 K sim- 
ulations and the eigenvector inner product o. . which is 
presented on the right side of Fig. [5l The squared inner 
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FIG. 5: Left: Unfolding pathways for the eigenvectors 1 and 2 
after 10 ns for the unbiased 500 K simulations and the biased 
300 K simulations (top left) and the unbiased 300 K simu- 
lations (top right). Right: Inner product matrix o.^. for the 
biased 300 K simulations and the unbiased 500 K simulations. 



product for the first three eigenvectors is O33 = 0.787 
with a square root of 0.887 which indicates a nearly iden- 
tical subspace of the first three eigenvectors at different 
temperatures. These values are supported by the inner 
product matrix o. . . Eigenvector 1 has an inner product 
of 0.886 indicating a remarkable identity of both trajec- 
tories. This is also true for eigenvectors 2 and 3 with 
values of 0.841 and 0.753. Hence the overall motion of 
the first three eigenvectors which describe 73 % of the 
concerted motion (Fig. |4|) is directly comparable at both 
temperatures. This proves the assumption that the usage 
of elaborated temperatures results in unfolding pathways 
that are comparable to those derived at lower tempera- 
tures. 

The results for all unbiased 500 K simulations and 
the complete biased 300 K simulation trajectory are pre- 
sented in Fig. [6l It is obvious that similar regions are ex- 
plored at both temperatures and the overall displacement 
at both temperatures is comparable. Several conforma- 
tions can be identified along the trajectories starting with 
the initial i-motif. For a detailed analysis of the stability 
of the presented conformations and the unfolding path- 
way, the underlying free energy landscapes have to be 
calculated. This allows to decide whether a conforma- 
tion is stable or represents a global minimum which can 
be directly related to the detected unfolding path. The 
free energy landscapes at 300 K and 500 K are presented 
in Fig. [71 

It is obvious that the most stable conformation is given 
by a stretched structure at 500 K. This is in contrast to 
300 K where it can be seen that two stable hairpin con- 
formations (b) and (c) represent the lowest free energy 
conformations. It can be assumed that the local ordering 
of water molecules is mainly responsible for this effect due 
to the fact that it is largely dominated by electrostatic in- 
teractions of the DNA. At larger temperatures, the domi- 
nance of the Coulomb interactions in comparison to ther- 
mal energy becomes smaller such that additional solvent 
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FIG. 6: Explicit trajectories for all 500 K simulations (top) 
and the biased 300 K (bottom) simulation. Typical confor- 
mations at certain regions and both temperatures are shown 
as snapshots and denoted by letters. 



and molecular configurations are accessible. Finally this 
results in a change of the entropy of these configurations 
such that the free energy minima are adjusted by entropic 
contributions. Although the minima and the barriers dif- 
fer in their height and their relative position, the overall 
shape of the landscapes is similar. Entropic contributions 
as discussed above may be responsible for these slight de- 
viations. The good agreement becomes also obvious by 
studying the unfolding pathways and the corresponding 
free energies. Thus it can be concluded that comparable 
unfolding pathways can be observed as it was also dis- 
cussed above. However, the realization of the unfolding 
pathway in an unbiased simulation at lower temperatures 
is dependent on the energy barriers that are occurring on 
the pathway. As Fig. [71 indicates, the general movement 
of the unfolding process is comparable at both tempera- 
tures. 

The hairpin conformations (b) and (c) at 300 K belong to 
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FIG. 7: Free energy landscape at 300 K (top) and 500 K 
(bottom) for the eigenvectors 1 and 2 with the corresponding 
determined unfolding pathways. The lines correspond to free 
energy differences of 1.5 kcal/mol. 



free energy minima and are the reason for the stagnating 
root-mean square deviation presented in Fig. O Hence 
the overcome of the energetic barriers for these structures 
is cumbersome in a 300 K unbiased simulation. Thus it 
can be concluded that the unfolding pathway at lower 
temperatures is mainly realized by a transition into con- 
formations (b) and (c). In contrast to 300 K, the free 
energy landscape at 500 K indicates these conformations 
as local minima with low barriers such that the transi- 
tion into a fully extended strand representing the global 
energy minimum can be realized. 

Starting with the i-motif (a) the initial opening of one end 
with the groups CYS3-T is sufficient for the transition in 
the global minimum (b) at 300 K. Overcoming this sta- 
ble configuration results in two possible pathways. The 
first possibility is the transition into the neighbored min- 
imum which is given by a planar hairpin structure (c). 
The other pathway follows the opening of the strand with 
groups CYSl and TAAl into a planar s-shaped structure 
(d). The final transition is then realized by an unfold- 



ing into the fully extended strand (e). However, the last 
pathway is better defined for 300 K due to the fact that 
the completely unfolded region is energetically more fa- 
vorable for the 500 K simulations as the free energy land- 
scapes indicate. However all configurations have been 
clearly visited in the 500 K simulations. 
A further pathway is the unfolding into a twisted struc- 
ture (g) where the strand with groups CYSl-T follows 
a rotational motion. After this move, the ends of the 
strand relax into structure (f). By a second process, the 
twisted configuration transforms into conformation (d) 
which is also present in the first unfolding pathway at 
300 K. The final transition into the stretched structure 
(e) is identical to the first unfolding pathway. At 500 
K this pathway visits conformations (g) until the config- 
uration fully unfolds on different pathways as discussed 
above. 



SUMMARY AND CONCLUSION 

We have simulated the unfolding of the DNA i-motif 
via Molecular Dynamics simulations in explicit solvent 
at 500 K and at 300 K. Our results indicate that the 
initial structure of the i-motif vanishes on a timescale 
of a few nanoseconds. Thus our results are in agree- 
ment to experimental observations which indicated that 
an unprotonated i-motif is not stable at room tempera- 



ture 

We have shown that unfolding pathways can be investi- 
gated either by the usage of elaborated temperatures as 
well as biasing potentials. The occurring trajectories are 
similar as a detailed analysis of the essential dynamics 
indicated. Hence we have presented a method to prove 
the transformation of high temperature unfolding trajec- 
tories onto lower temperatures. Due to this similarity we 
can identify two main unfolding pathways for the DNA 
i-motif at both temperatures. The most relevant one at 
300 K visits stable hairpin configurations from which it 
unfolds only due to thermal activation or the presence 
of biasing potentials into an extended structure in a rea- 
sonable simulation time. The other one includes torsional 
motion and is energetically less favorable. 
The calculated free energy landscapes at 500 K and 300 
K are similar in their global shape which leads to com- 
parable unfolding pathways. Hence it can be concluded 
that this property is the main reason for the success of 
high temperature simulations. Finally our results indi- 
cate that the analysis of the unfolding pathways is only 
reasonable in combination with the calculated free en- 
ergy landscapes. This becomes specifically important by 
comparing the global energetic minima which may re- 
sult in only partly visited unfolding pathways. Hence it 
can be concluded that at lower temperatures the i-motif 
mainly unfolds into the discussed hairpin configurations. 
The fully extended strand represents the global energy 
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minimum only at higher temperatures. Finahy it is obvi- 
ous that the detailed unfolding pathway is better defined 
for 300 K although the main configurations are visited at 
both temperatures. 

As it has been discussed, the determination of the unfold- 
ing pathway of the i-motif is of prior importance regard- 
ing the applicability in nanomachines p^, 115|, |16|, I20j . 
Hence our results allow to improve the applicability of 
i-motifs as functional materials in nanotechnology due 
to the explicit knowledge of the unfolding pathway and 
its stable conformations. Finally the investigation of the 
unfolding properties may also lead to a more detailed 
knowledge about the molecular mechanisms in the hu- 
man cell. This is in particular interesting due to the fact 
that the knowledge of the function of the i-motif is still 
missing. 
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